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(54) Motion compensated interpolation for digital video signal processing 



(57) A method determines true motion vectors 
associated with a sequence of images. The images 
include fields made up of blocks of pixels. The method 
selects candidate feature blocks from the blocks of pix- 
els. The candidate feature blocks have intensity vari- 
ances above a threshold indicative of texture features. 
Candidate feature blocks in similarly numbered adja- 



cent field intervals are compared to determine sets of 
displaced frame differences parameters for each candi- 
date feature block. The true motion vectors for each 
candidate feature block are determined from a minimum 
weighted score derived from the difference parameters. 
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Description 

Related Application/Patents . 

5 [0001] In an earlier filed U.S. Application Serial No. 08,800,880, entitled "Adaptive Video Coding Method," filed in 
the name of Huifang Sun and Anthony Vetro, now U.S. Patent No. 5,790,1 96, a system was disclosed for coding video 
signals for storage and/or transmission using joint rate control for multiple video objects based on a quadratic rate dis- 
tortion model. In a second application, Serial No. 08/896,861 , entitled "Improved Adaptive Video Coding Method," Sun 
and Vetro described an improved method of target distribution and introduction of a tool to take into account object 

w shape in the rate control process. The disclosures of these previously-filed applications are incorporated by reference 
in the present application, particularly insofar as they describe the general configurations of encoding and decoding, 
including motion estimation and motion compensation functions in such systems. 

BACKGROUND OF THE INVENTION 

15 

[0002] For an overall understanding of general techniques involved in encoding and decoding video image informa- 
tion, reference should be made to the "MPEG-4 Video Verification Model Version 5.0", prepared for the International 
Organization for Standardization by the Ad hoc group on MPEG-4 video VM editing, paper Number MPEG 96/N1469, 
November 1996, the contents of which are herein incorporated by reference. 

20 [0003] This invention relates to encoding and decoding of complex video image information including motion com- 
ponents which may be encountered, for example, in multimedia applications, such as video-conferencing, video-phone, 
and video games, in order to be able to transfer complex video information from one machine to another, it is often 
desirable or even necessary to employ video compression techniques. One significant approach to achieving a high 
compression ratio is to remove the temporal and spatial redundancy which is present in a video sequence. To remove 

25 spatial redundancy, an image can be divided into disjoint blocks of equal size. These blocks are then subjected to a 
transformation (e.g., Discrete Cosine Transformation or DCT), which decorrelates the data so that it is represented as 
discrete frequency components. With this representation, the block energy is more compact, hence the coding of each 
block can be more efficient. Furthermore, to achieve the actual compression, two-dimensional block elements are 
quantized. At this point, known run-length and Huffman coding schemes can be applied to convert the quantized data 

30 into a bit-stream. If the above process is applied to one block independent of any other block, the block is said to be 
intra-coded. On the other hand, if the block uses information from another block at a different time, then the block is said 
to be inter-coded. Intercoding techniques are used to remove temporal redundancy. The basic approach is that a resid- 
ual block (or error block) is determined based on the difference between the current block and a block in a reference 
picture. A vector between these two blocks is then determined and is designated as a motion vector. To keep the energy 

35 in the residual block as small as possible, block-matching algorithms (BMAs) are used to determine the block in the ref- 
erence picture with the greatest correlation to the current block. With the reference block locally available, the current 
block is reconstructed using the motion vector and the residual block. For the most part, video coding schemes encode 
each motion vector differentially with respect to its neighbors. The present inventors have observed that a piecewise 
continuous motion field can reduce the bit rate in this case. Hence, a rate-optimized motion estimation algorithm has 

40 been developed. The unique features of this proposal come from two elements: (1 ) the number of bits used for encoding 
motion vectors is incorporated into the minimization criterion, and (2) rather than counting the actual number of bits for 
motion vectors, the number of motion vector bits is estimated using the residues of the neighboring blocks. With these 
techniques, the bit-rate is lower than in prior encoders using full-search motion-estimation algorithms. In addition, the 
computational complexity is much lower than in a method in which rate-distortion is optimized. The resulting motion field 

45 is a true motion field, hence the subjective image quality is improved as well. 

[0004] If we disregard for a moment the advantages that are achieved in terms of coding quality and bit rate sav- 
ings, and only concentrate on the improvements in subjective image quality, it can be demonstrated that the resulting 
true motion field can be used at the decoder, as well, in a variety of other ways. More specifically, it has been found that 
the true motion field can be used to reconstruct missing data, where the data may be a missing frame and/or a missing, 

so field. In terms of applications, this translates into frame-rate up-conversion, error concealment and interlaced-to-pro- 
gressive scan rate conversion capabilities, making use of the true motion information at the decoder end of the system. 
[0005] Frame-Rate Up-Conversion . The use of frame-rate up-conversion has drawn considerable attention in 
recent years. To accomplish acceptable coding results at very low bit-rates, most encoders reduce the temporal reso- 
lution, i.e., instead of targeting the full frame rate of 30 frames/sec (fps), the frame rate may be reduced to 10 fps, which 

55 would mean that 2 out of every 3 frames are never even considered by the encoder. However, to display the full frame 
rate at the decoder, a recovery mechanism is needed. The simplest mechanism is to repeat each frame until a new one 
is received. The problem with that interpolation scheme is that the image sequence will appear very discontinuous or 
jerky, especially in areas where large or complex motion occurs. Another simple mechanism is linear-interpolation 
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between coded frames. The problem with this mechanism is that the image sequence will appear blurry in areas of 
motion, resulting in what is referred to as ghost artifacts. 

[0006] From the above, it appears that motion is the major obstacle to image recovery in this manner. This fact has 
been observed by a number of prior researchers and it has been shown that motion-compensated interpolation can 

5 provide better results. In one approach, up-sampling results are presented using decoded frames at low bit-rates. How- 
ever, the receiver must perform a separate motion estimation just for the interpolation. In a second approach, an algo- 
rithm that considers multiple motion is proposed. However, this method assumes that a uniform translational motion 
exists between two successive frames. In still a third approach, a motion-compensated interpolation scheme is per- 
formed, based on an object-based interpretation of the video. The main advantage of the latter scheme is that the 

w decoded motion and segmentation information is used without refinement. This may be attributed to the fact that the 
object-based representation is true in the "real" world. However, a proprietary codec used in that approach is not readily 
available to all users. 

[0007] The method proposed in the present case is applicable to most video coding standards in that it does not 
require any proprietary information to be transmitted and it does not require an extra motion estimation computation. 

15 The present motion-compensated interpolation scheme utilizes the decoded motion information which is used for inter- 
coding. Since the current true motion estimation process provides a more accurate representation of the motion within 
a scene, it becomes possible to more readily reconstruct information at the decoder which needs to be recovered before 
display. Besides quality, the major advantage of this method over other motion compensated interpolation methods is 
that significantly less computation is required on the decoder side. 

20 [0008] Error Concealment . True motion vector information can also be employed to provide improved error conceal- 
ment. In particular, post-processing operations at the decoder can be employed to recover damaged or lost video areas 
based on characteristics of images and video signals. 

[0009] Interlaced-to-Progressive Scan Conversion . In addition to the above motion-compensated interpolation 
method, a related method for performing interlaced-to-progressive scan conversion also becomes available. In this sce- 

25 nario, rather than recovering an entire frame, an entire field is recovered. This type of conversion is necessary for cases 
in which a progressive display is intended to display compressed inter-frame, and motion-compensated inter-frame. 
[0010] Intraframe methods . Simple intraframe techniques interpolate a missing line on the basis of two scanned 
lines which occur immediately before and after the missing line. One simple example is the "line averaging" which 
replaces a missing line by averaging the two lines adjacent to it. Some other improved intraframe methods which use 

30 more complicated filters or edge information have been proposed by M. H. Lee et al. (See "A New Algorithm for Inter- 
laced to Progressive Scan Conversion Based on Directional Correlations and its IC Design," IEEE Transactions on Con- 
sumer Electronics, Vol. 40, No. 2, pp. 119-129, May 1994). However, such intraframe techniques cannot predict 
information which is lost from the current field, but which appears in neighboring fields. 

[0011] Interframe techniques take into account the pixels in the previous frame in the interpolation proce- 
ss dure. One simple and widely-adopted method is the field staying scheme which lets 
l(m,2n+((t- 1) mod 2)J) = l(m : 2n+((t-1) mod2), t-1) . Non-motion-compensated approaches, which apply linear or non- 
linear filters, are fine for stationary objects, but they result in severe artifacts for moving objects. 

[0012] For moving objects, it has been found that motion compensation should be used in order to achieve higher 
quality. Some motion compensated de-interlacing techniques have been proposed. For example, it has been shown 
40 that motion compensated de-interlacing methods are better than the intraframe methods and non-motion-compensated 
interframe methods (see Lee et al., "Video Format Conversions between HDTV Systems," IEEE Transactions on Con- 
sumer Electronics, Vol. 39, No. 3, pp. 219-224, Aug. 1993). 

[0013] The system disclosed by the present inventors utilizes an accurate motion estimation/compensation algo- 
rithm so it is classified as a motion-compensated interframe method. The interlaced-to-progressive scan conversion 

45 procedure contains two parts: (1 ) a motion-based compensation, and (2) a generalized sampling theorem. The motion- 
based compensation essentially determines a set of samples at a time 2t, given samples at 2t- 1 and 2t+ 1 . In general, 
these determined sets of samples will not lie on the image grid at time 2t, since the motion vector between 2t-1 and 
2t+1 is arbitrary. Therefore, a generalized sampling theorem is used to compute the missing samples at the grid points 
given the motion compensated samples and the samples which already exist. Formally, this can be expressed as: first, 

so find [l(m+A x ,2n+A y ,2t)} given {l(m,2n-1 ,2t-1)} and {l(m,2n+1,2t+1)l then find {l(m,2n+ 1,2t)} given {l(m,2n,2t)} and 
{l(m+A x ,2n+A y 2t)). 

[0014] While the invention will be described hereinafter in terms of a preferred embodiment and one or more pre- 
ferred applications, it will be understood by persons skilled in this art that various modifications may be made without 
departing from the actual scope of this invention, which is described hereinafter with reference to the drawing. 
55 [001 5] In accordance with a further aspect of the present invention, a method of image data interpolation comprises 
decoding true motion vector data associated with blocks of digitally encoded image information, with the true motion 
vector data being dependent in part on neighboring image block proximity weighting factors. The method further com- 
prises interpolating from the supplied picture information, image sequence signal data corresponding to intermediate 
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image time intervals absent from the supplied picture information, the absent image time intervals corresponding to 
intermediate image information occurring during the intermediate image time intervals in sequence between supplied 
image time intervals associated with the supplied picture information. The interpolation comprises the steps of con- 
structing image pixels for image blocks in each intermediate image time interval, based upon corresponding pixels in 

5 corresponding blocks in the supplied picture information which occur immediately before and after the intermediate 
image time interval, by distributing to constructed image pixels a fractional portion of image intensity difference informa- 
tion between the corresponding pixels occurring before and after the intermediate time to produce averaged intensity 
pixels for the intermediate image time interval. Thereafter, each constructed image pixel in each intermediate time inter- 
val is associated with a corresponding true motion vector equal in magnitude to a fractional part of the true motion vec- 

10 tor information associated with the block in which the corresponding pixel is located in a reference supplied time 
interval, the fractional part being determined according to the number of intermediate image time intervals inserted 
between supplied time intervals. Each constructed, averaged intensity pixel is then associated with a spatial location in 
the intermediate image time interval according to the fractional part of the corresponding decoded true motion vector. 

75 DRAWING 

[0016] In the drawing, 

Figure 1 is a block diagram of a portion of an encoder adapted for use in connection with the present invention; 
20 Figure 2 is a block diagram of a portion of an encoder of the same general type as is illustrated in the above-men- 

tioned first Sun and Vetro application; 

Figure 3 is a block diagram of a portion of a video decoder adapted for use in connection with the present invention; 
Figure 4 is an illustration which is helpful in understanding terminology used in connection with the frame-rate up- 
conversion process, using transmitted motion information in accordance with one aspect of the present invention; 
25 Figure 5 is schematic illustration of certain aspects of motion interpolation as employed in a motion based de-inter- 

lacing process according the another aspect of the invention; 

Figure 6 is a schematic illustration of certain aspects of the use of a generalized sampling theorem in connection 
with motion-based de-interlacing according to the present invention; 

Figure 7 is a schematic illustration which is helpful in understanding a technique for accomplishing interlaced to 
30 progressive scan conversion in a system employing the present invention; and 

Figure 8 is a pictorial representation helpful in understanding principals of block motion compensation. 

DETAILED DESCRIPTION 

35 [0017] In the video encoder shown in block diagram form in Figure 1 , a digital video signal representing a sequence 
of images 20, comprising spatially arranged video blocks 20A, 20B, 20C, etc., each made up of individual image pixels 
(or pels) - see Figure 7 - is supplied as a signal input to the video encoder. The video image information is supplied in 
a conventional sequence of luminance and chrominance information. 

[0018] In accordance with the present invention, as will be described more fully below, the incoming digital video 

40 signal is supplied to a true motion estimation processor 22 to determine true motion vectors TMV representative of the 
"best match" between, for example, a block of the current frame and a "corresponding" block of a previous frame as 
illustrated in Figure 8. The processor 22 provides an appropriate motion compensated prediction for the video signal. 
"Difference image" information is then subjected to a Discrete Cosine Transformation (DCT) 24 and the transformed 
information is subjected to quantization (Q) 26, each of which operations is conventional. Quantized transform coeffi- 

45 cients and quantizer indicator information are supplied to a variable length coding (VLC) encoder such as the general 
arrangement shown in Figure 2. The quantized transform coefficients are subjected to Inverse Quantization (IQ) 28 and 
then to Inverse Discrete Cosine Transformation (IDCT) 30. The resulting information is coupled to a frame memory 32 
to provide a delay. Image information from a current image and a frame delayed image from frame memory 32 are com- 
pared and processed, as will be explained below, to produce true motion vector information for application in accord- 

50 ance with the present invention at the decoder of a system (see Figure 3). An encoder arrangement of the general type 
shown in Figure 2, but not specifically directed to providing true motion vectors, is described in greater detail in the 
above-identified patent of Sun and Vetro, the disclosure of which is incorporated herein by reference. 
[001 9] Algorithms for computing motion flow can be divided into two categories; those for removing temporal redun- 
dancy and those for tracking physical motion. 

55 [0020] The first set of motion estimation algorithms is aimed at removing temporal redundancy in the video com- 
pression process. In motion pictures, similar scenes exist in a frame and a corresponding previous frame. In order to 
minimize the amount of information to be transmitted, the MPEG video coding standards code the displaced difference 
block instead of the original block. For example, assume a block in a current frame is similar to a displaced block in a 
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previous frame. (See for example, Figure 5 showing two odd fields and an intermediate even field in an interlaced sys- 
tem, with the "similar" block shown by the dashed line outline). A motion prediction vector together with a residue (dif- 
ference) is coded. Since an achieved compression ratio will depend on removal of redundancy, the displacement vector 
corresponding to minimal displaced frame difference (DFD) is often used in prior systems. 

5 [0021] A second type of motion estimation algorithm aims at accurately tracking the physical motion of features in 
video sequences. A video sequence arises from putting a three dimensional (3D) real world onto a series of 2D images. 
When objects in the 3D real world move, the brightness (pixel intensity) of the 2D images changes correspondingly. The 
2D projection of the movement of a point in the 3D real world will be referred to herein as the "true motion." Computer 
vision, whose goal is to identify the unknown environment via the moving camera, is one of the many potential applica- 

w tions of true motion information. 

[0022] Many existing motion estimation algorithms attempt to optimize the search for a suitable or best motion vec- 
tor in the rate-distortion sense. Often, a complex scheme results and the motion vectors do not correspond to the true 
physical motion within the scene. 

[0023] In most video compression algorithms, there is a tradeoff among picture quality, compression ratio and com- 
15 putational cost. Generally speaking, the lower the compression ratio, the better the picture quality. 

[0024] In high quality video compression (e.g. video broadcasting), quantization scales are usually low. Therefore, 
the number of bits available for an inter frame residual block B i res is dominant with regard to the total bit rate B total . Until 
recently, it was generally believed that use of a smaller displaced frame difference (DFD or mean residue) would result 
in fewer bits to code the residual block, and thus a smaller total bit rate. Hence, the minimal DFD criterion is still widely 
20 used in BMAs (block matching algorithms) and, the motion vector for any particular block is the displacement vector 
which carries the minimal DFD. That is: 

motion vector = arg min{DFD (v) } 

25 

(1) 



30 

[0025] However, it has been observed that full-search BMAs are computationally too costly for a practical real-time 
application, usually do not produce the true motion field, which could produce better subjective picture quality, and gen- 
erally cannot produce the optimal bit rate for very low bit rate video coding standards. 

[0026] In most video coding standards, motion vectors are differentially encoded, therefore it is not always true that 
35 a smaller DFD will result in a reduced bit rate. The reason is that the total number of bits, which includes the number of 
bits for interframe residues, also includes the number of bits for coding motion vectors. Conventional BMAs treat the 
motion estimation problem as an optimization problem on DFD only, hence they suffer from the high price of the differ- 
ential coding of motion vectors with large differences. The smaller the difference, the less bits that are required. A rate- 
optimized motion estimation algorithm such as is shown below, should account for the total number of bits: 



40 



45 



{vifc-t - arg min{bits (DFDx (v 1 ) ,Q X ) + bits (avj) 
+ bits (DFD 2 (v 2 ) , Q 2 ) + bits{Av 2 ) 



+ bits(DFD n (v n ) , QJ + bits Uv n )} 



where 



is the motion vector of block i, 
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5 



DFD x (v A ) 



10 



represents the DFD of block i, and bits 



(DFD X <v x ) ,QJ 



75 is the number of bits required for this frame difference. 

[0027] Spatial Correlation Technique . It has been found that the true motion field may be considered piecewise con- 
tinuous in the spatial domain. Therefore, the motion vectors can be more dependably estimated if the global motion 
trend of an entire neighborhood is considered, as opposed to that of one feature block itself. This approach enhances 
the chance that a singular and erroneous motion vector may be corrected by its surrounding motion vectors. For exam- 

20 pie, assume that there is an object moving in a certain direction and a tracker fails to track its central block due to noise, 
but successfully tracks its surrounding blocks. When a smoothness constraint is applied over the neighborhood, the true 
motion of the central block can be recovered. 

[0028] Spatial/Temporal Correlation Technique . The true motion field may also be considered to be piecewise con- 
tinuous in the temporal domain. That is, assuming that the motion of each block is much smaller than the size of the 
25 block, then the motion field is piecewise continuous in the temporal domain. Therefore, motion fields may not only be 
considered to be piecewise continuous in the spatial domain (2D) but also piecewise continuous in the temporal domain 
(1 D). The initial search area for matching thus can be reduced by exploiting correlations of motion vectors between spa- 
tial and temporal adjacent blocks. 

[0029] A piecewise continuous motion field is advantageous in reducing the bit rate for differentially encoded motion 
30 vectors. Hence, a "true" motion tracker based on a neighborhood relaxation approach offers an effective approach for 
rate-optimized motion estimation. In the context of neighborhood relaxation, Eq. (2) can be written as: 



where p - ^Qi/ <*i 



The coefficients a, and a 2 are selected to provide a desired approximation for the influence of neighboring blocks on 
the motion vector. 
45 [0030] Assume that Bj is a neighbor of B jt 



35 



motion of B x = arg mln{cV l DFD A (v) + ajavj} 

v Q* 

= arg man {DFD^v) + PJavJ}- 



50 



is the optimal motion vector, and that 



DFD-j (v) 



55 



increases as 



v 
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deviates from 

5 



according to 



DFD 3 (v) « DFDj (v*) + >iv~ - v* | , (4) 



70 

or, 



|av| - |v - Vj | * y* 1 (DFD 3 (v) - DFD^v^K). (5) 

75 

Substituting Eq. (5) into Eq. (3), 

motion of B t « arg min{DFD t (v) (6) 

v 

20 + M 3 Jt/^ (,7) -(DFD 3 (v/))}, 

where N(Bj) means the neighboring blocks of B r ( jut = p/y = a 2 Qi/y a ) Here we can use an idea commonly adopted 
25 in relaxation methods, i.e., we can let 

v 3 (and DFD 5 (vj) ) 

50 remain constant during the block i updating of the neighborhood relaxation. Therefore, they can be dropped from Eq. 
(7), resulting in 

motion of B t m arg min{DFDi{v) + /i E (DFD^ (v) } (8) 
35 v B^eNfBj 



[0031] If a particular motion vector results in the DFDs of the center block and its neighbors dropping, then it is 
40 selected to be the motion vector for that block for the encoder. That is, when two motion vectors produce similar DFDs, 
the one that is much closer to the neighbors' motion will be selected. The motion field produced by this method will be 
smoother than that of Eq. (1). 

[0032] The above approach will be inadequate for non-translational motion, such as object rotation, zooming, and 
approaching. For example, assume an object is rotating counterclockwise. Because Eq. (8) assumes the neighboring 
45 blocks will move with the same translational motion, it may not adequately model the rotational motion. Since the neigh- 
boring blocks may not have uniform motion vectors, a further relaxation on the neighboring motion vectors is introduced; 
that is 

so motion of B A = arg min{DFD (B 1# v) (9) 

b^n&Y 1 x DFD(B >'^ + *>> 



where a small 8 is incorporated to allow local variations of motion vectors among neighboring blocks due to the non- 
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translational motions, and ^ j is the weighting factor for different neighboring blocks. The shorter the distance between 
Bj and By, the larger will be \x- t j. Also, the larger the Qj, the larger the |ij j. In particular, we use the 4 nearest neighbors 
with higher weighting for DFD's closer to the center. The inclusion of the 

5 2T 

vector allows reasonable tracking of more complex motion such as rotation, zooming, shearing, and the like. Neighbor- 
hood relaxation will consider the global trend in object motion as well as provide some flexibility to accommodate non- 

w translational motion. Local variations 5 among neighboring blocks of Eq.(9), are included in order to accommodate 
those other (i.e., non-translational) affine motions such as (a) rotation, and (b) zooming/approaching. 
[0033] Referring to Figure 3, at the decoding end of the system, a received signal bitstream is provided to a Variable 
Length (VLC) Decoder 38 and the output of Decoder 38 is coupled to an Inverse Quantizer (IQ) 40. The True Motion 
Vector (TMV) information provided by the encoder (Fig. 1) in accordance with the present invention is extracted and is 

15 supplied to motion compensation predictor 42. The main signal output of VLD 38/IQ 40 is subjected to Inverse Discrete 
Cosine Transformation (IDCT) 44 and is combined at adder 46 with an output from motion compensation block 42. The 
combined output from adder 46 is supplied to a frame memory 48, the output of which is supplied to remaining post 
processing stages of the information processing receiver for post-processing to produce a desired image on a display. 
The True Motion Vector information available at the decoder is also supplied to the post-processing stage 50 to accom- 

20 plish, for example, a desired frame rate up-conversion or interlaced to progressive scan as explained below. 

Frame-Rate Up-Conversion . 

[0034] Referring to the schematic diagram of Figure 4 of the drawing, if image block B,- moves v, from frame F M to 
25 frame F u1 , then it is likely that block B, moves v,/2 from frame F M to frame F t i.e., as depicted in Figure 4. The basic 
technique of motion-based frame-rate up-conversion is to interpolate frame F t based on frame F M , frame F t+1 , and 
block motion vectors {v/} can be stated as follows: 

so I (p - V ^ , t) = ^{I (p - v A , t-1) + I (p, t+l) } Vp eB, 

2 2 

(10) 

35 

where p = [x,y] ! indicates the pixel location, l(p,T) means the intensity of the pixel [x,y] at time t, and I (p,t) means the 
reconstructed intensity of the pixel [x,y] at time t and v*j is the motion vector of block Bj. 

[0035] Note that the more accurate the motion estimation, v/2, the smaller the reconstruction error, Z\\l(p,t)-I)p,t)\\, 
and the higher the quality of the motion based interpolation. Therefore, one possible technique of frame-rate up-con- 
40 version, using transmitted motion is to encode F M , F t+1 , with {2v,} where v, is the motion vector of B, from F t to F t+1 . 
The reconstruction error will be minimized, but the rate-distortion curves may not be optimized. 

[0036] We can show that Eq. (9) captures the true movement of the block in the scene more accurately than Eq. (1 ) 
does. Hence, it is most likely that v,/2 using Eq. (9) is more accurate than v,l2 using Eq. (1). 

[0037] When a block B, is coded as the INTRA block (no motion compensation), it usually implies an uncovered 
45 region (see Figure 4). Hence, 

l(p t t)^l(p J t + 1)y P EB i (11) 

When i(p,t) has never been assigned to any value by Eq. (10) and Eq. (11), it usually implies an occluded region. As a 
so result, 

l(pJ)=l(p,M) (12) 
[0038] For the more general problem of interpolating from F t+n our method can be summarized as follows: 

55 
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i(p,t)= Zw(p.p»-rrrXi(p-^r.t-m) + a(p+^r.t+n» 



70 



75 



20 



25 



30 



w(p,t)= y 2w(p,p..-^-) 

AtP ' tJ - [ I(p,t-l)ifW(p.t)=0 



where 



VI 



is the movement of B, from frame F t . m to frame F f+n , 

is the coordinate of B/', and w(*,~) is the window function. 

VI - 0 

when Bj is INTRA-coded, w(**) equals zero when 



is outside the £?,. In order to reduce the block artifacts, the weighting value of w(**) could be similar to the coefficients 
defined for overlapped block motion compensation (OBMC). 

35 INTERLACED-TO-PROGRESSIVE SCAN CONVERSION 

[0039] As stated above, the present arrangement is particularly suited for use in connection with inierlaced-to-pro- 
gressive scan conversion (see Figures 6 and 7). The first step is to perform true motion-based compensation at the 
video decoder (See Figure 3) and obtain a missing set of samples on the 2f plane, and the second step is to use these 
40 samples and the samples which exist for the preceding field to determine the samples for the missing field (See Fig. 6). 
Some issues regarding motion-based compensation are discussed first. Then, our use of the generalized sampling the- 
orem is explained. Finally, a variety of different scenarios are considered so that the two steps can be put together to 
achieve a practical scan conversion scheme that yields a high quality output. 

[0040] Prior methods are known which utilize a 3D recursive-search block matcher to estimate motion up to 1/4 to 
45 Vz pixel accuracy. The present method provides even higher accuracy using its generalized sampling theorem. The high 
precision TMT vertically integrates two parts: (1) a matching-based TMT as the base, and (2) a gradient-based motion 
vector refinement. Gradient-based approaches are accurate at finding motion vectors less one pixel resolution. 
[0041] Most previous methods use causal information (never use the next fields). They perform motion estimation 
based on the current field information. On the other hand, the present method uses non-causal information. The motion 
50 estimation is performed making use of the previous field and the next field. By assuming that the motion of a block is 
almost linear over a very small period of time, we can linearly interpolate the motion vectors related to the current field. 
In addition, because the information from previous and next fields is available, the non-grid pixels of the current field are 
bi-directionally interpolated for higher precision. Furthermore, previous or next odd fields are used for the motion esti- 
mation of the current odd field (See Figure 7). Previous or next even fields are used for the motion estimation of the cur- 
55 rent even field. Odd fields are not used for even field motion estimation or vice versa. Thus, only similarly (odd or even) 
numbered fields are compared. Most pixels in the odd field will stay in the odd field (e.g., non-motion background, hor- 
izontal panning regions). Therefore, using previous or next odd fields for the motion estimation of the current odd field 
is adequate. Only when there is an odd-pixel vertical movement will a pixel in the odd field move to the even field. How- 
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ever, when there is an odd-pixel vertical movement, the lost information in the current odd field is also lost in the previ- 
ous even field. In that case it is unnecessary to track the motion. 

[0042] The present method adaptively combines line averaging and motion compensated deinterlacing techniques. 
Based on the position of the motion compensated, sampled point, different weightings are assigned to it. When the 
5 motion compensated, sampled point has the same position as a missed pixel (e.g. non-motion region), it has the high- 
est reliability. On the other hand, when the motion compensated, sampled point has the same position as the existing 
pixel, it has the lowest reliability. In addition, the reliability of the motion vector also influences the reliability of the motion 
compensated, sampled point. 

[0043] The first step of the deinterlacing approach is to perform motion-based compensation and obtain a missing 
w set of samples on the 2t (even field) plane, as shown in Figure 5. There are many approaches to obtain a missing set 
of samples {l(m+A x ,2n + A y 2t)} on the 2t plane, such as: 

1. we can find l(m+v x , 2n+1 + v y , 2t) = l(m, 2n+1 ,2t+1) - l(m+2v x , 2n+1+2v yt 2t-1) from the motion esti- 
mation between the preceding odd field (2t - 1) and the following odd field (2t +1); 
15 2. we can find l(m+v x , 2n+v y , 2t) = f(m, 2n,2t+ 2) from the motion estimation between even field (2t) and even 

field (2t+2); 

3. we can find l(m+v x , 2n+v y , 2t) = l(m, 2n,2t-2) from the motion estimation between the even field (2t) and the 
preceding even field (2t-2). 

20 Since A x and A y require high accuracy, the true motion tracking for this application requires higher accuracy and preci- 
sion than the true motion tracking for compression purposes. Our high-precision true motion tracker vertically integrates 
two parts: (1) a matching-based true motion tracker as the base, and (2) a gradient-based motion vector refinement. 
Our matching-based true motion tracker which uses a neighborhood relaxation formulation is very dependable. How- 
ever, the true motion tracker can only find full-pel motion vectors. That is, the precision of the estimated motion vectors 

25 cannot be smaller than an integer. On the other hand, the precision of the motion vectors estimated by gradient-based 
techniques can be very small. Therefore, gradient-based techniques should be exploited in high-precision motion esti- 
mation. 

[0044] Once a new set of motion compensated samples has been found on the 2t plane, we must then use those 
samples to determine a set of samples which lie on the sampling grid for the missing field, as shown in Figure 6. Even 
30 though this sampling and reconstruction problem is two-dimensional, it is assumed that the signal is separable. There- 
fore, if {f(m, 2n, 2t), /(m+A x ,2/7+ Ay, 2t)} are given, it actually takes two steps to find {l(m,2n + 1 , 2t)}. That is, 
For Horizontal Interpolation: Given {l(m+A x ,2n+A y) 2t)} find {!(m,2n+Ay,2t}: Because there are enough horizontal sam- 
ples at the Nyquist rate, 

{ X (x, 2y+a y , 2 1 ) = X 1 to+Ar , 2y+ A y ,2t) 



40 sinc(x -m-zlj (13) 



For Vertical Interpolation: Given {l(m,2n,2t),l(m,2n+ A y 2t)}\\nd{l(m % 2n+ J \ ,2t)}\ Since 0<A y <2, the generalized sampling 
theorem is used as: 

45 

I n) (x, 2y+l , 2t) = I a ~ l} (x, 2y+l ,2t) + 



50 



55 
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{ J (x, 2y+A y , 2£) - ]T J(x, 2x2, 2 1) sine (2y+A y -2n) - £ 

n 



sinc(& y - 1 



(x,2j3+l,2 t) siz2C(2y+^y-2n-i; } , ^ 7 (14) 
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[0045] The above method works well. However, there are two special cases which require further attention. 

15 1 . Object Occlusion and Reappearance: As mentioned before, whenever there is object occlusion and reappear- 

ance, this makes the motion estimation and the motion-based frame-rate up-conversion more difficult. In this 
motion-compensated deinterlacing problem, the motion vectors at the object occlusion and reappearance region 
are ignored and use the intra-field information (e.g. line-averaging techniques) are used. 

2. Field Motion Singularity: Where an object is moving upward/downward at (2n+1) pixels per field. (A y =0), multiple 
20 fields do not provide more information than that from a single field. Therefore, the intra-field information (e.g. line- 

averaging techniques) should be used. 

The present method can be summarized as follows: 

A(i,2j+ 1)=Z I I w(x,y,Bi)<T(U+ v»)J(2j+ l,y+ v*) 

{(I(x + 2va, y + 2vrs2t- 1) + F(x + 2v»,y + 2v^t + I ) / 2 - 
30 1 Zl(m>^t)sinc(x+Va-m)smc(y+v^-2n)- 

m ft 

Z 1 1 (m,2n + Ut)sinc(x + v» - m)sinc(y + - 2n - 1)} 

m n 



45 W(l,2j+1) = w(x,y,B 1 )<y {i,*: +v^) S (2j+l, r + **) 

5Q sincix + v^-i) sine (y + v yl - 2j - X) 

X ,l '"{i,2j + l,2t) - X ,J '(i,2j + l,2t) + 

where 
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2vi 



5 is the movement of B, from field f 2t -i to field f 2t +h S(a*6) = 1 when |a - b| <1 and 5 (a,b) = 0 otherwise, w( • , • , • )is the 
window function, and / (0) (/,2/+1 , 2t) is from the line averaging using the field f 2t . That is, 

I f0) {i,2j "+ l,2tj = £ J(i,2n / 2t) sine 

70 " 



w(x,y,Bj)=0 whenever (x,y) r is outside the Bj. In order to reduce the block artifacts, the weighting value of 
w( • , • , • )could be similar to the coefficients defined for overlapped block motion compensation (OBMC). 

15 

Generalized Sampling Theorem 

[0046] In the interlaced-to-progressive scan conversion method, a generalized sampling theorem is used. The 
Sampling Theorem itself is well-known as the following. If f(t) is a 1 -D function having a Fourier transform F(a>) such that 
20 F(co)=0 for |(o|>co 0 = n/T s (a band-limited signal), and is sampled at the points t n =nT s (Nyquist rate), then f(t) can be 
reconstructed exactly from its samples {/(niy} as follows: 

where sine (0) = 1 and sinc(x) = (jv x) I rex whenever x * 0. 



30 

[0047] Since the original sampling theorem debut, it has been generalized into various extensions. One General- 
ized Sampling Theorem is the following. If f(t) is band-limited to a> 0 = n/T s , and is sampled at V2 the Nyquist rate but, 
in each sampling interval, not one but two samples are used (bunched samples), then f(t) can be reconstructed exactly 
from its samples {f{2nT s +AT k ) 10 < AT S , < 2T X , k = 1,2} . 

35 

True Motion Information for Error Concealment . 

[0048] True motion vectors can also be used for better error concealment. Error concealment is intended to recover 
the loss due to channel noise (e.g., bit-errors in noisy channels, cell-loss in ATM networks) by utilizing available picture 

40 information. The error concealment techniques can be categorized into two classes, according to the roles that the 
encoder and decoder play in the underlying approaches. Forward error concealment includes methods that add redun- 
dancy at the source (encoder) end to enhance error resilience of the coded bit streams. For example, l-picture motion 
vectors were introduced in MPEG-2 to improve error concealment. However, syntax changes are required. Error con- 
cealment by post-processing refers to operations at the decoder to recover damaged picture areas, based on charac- 

45 teristics of image and video signals. 

[0049] The present method is a post-processing error concealment method that uses motion-based temporal inter- 
polation for damaged image regions. This method uses true motion estimation at the encoder. In this work, the syntax 
is not changed and thus no additional bits are required. Using the true motion vectors for video coding can even opti- 
mize the bit rate for residual and motion information. Using true motion vectors for coding offers significant improvement 

50 in motion-compensated frame-rate up-conversion over the minimal-residue BMA. The more accurate the motion esti- 
mation, the better the performance of frame-rate up-conversion. Because the error concealment problem is similar to 
the frame-rate up-conversion problem when the error is the whole frame, one can also interpolate the damaged image 
regions more readily making use of the true motion vectors as described above. 

[0050] While the foregoing invention has been described in terms of one or more preferred embodiments, it will be 
55 apparent to persons skilled in this art that various modifications may be made without departing from the scope of the 
invention, which is set forth in the following claims. 
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Claims 

1. In an electronic image sequence reproduction system, wherein digitally encoded picture information is supplied, 
including true motion vector data associated with individual blocks of image information, a method of image data 

5 interpolation comprising: 

decoding from said digitally encoded picture information said true motion vector data for said blocks of image 
information, with said true motion vector data being dependent in part on neighboring image block proximity 
weighting factors; 

w interpolating from supplied picture information, image sequence signal data corresponding to intermediate 

image time intervals absent from said supplied picture information, said absent image time intervals corre- 
sponding to intermediate image information occurring during said intermediate image time intervals in 
sequence between supplied image time intervals associated with said supplied picture information, said inter- 
polating comprising the steps of 

15 constructing image pixels for image blocks in each said intermediate image time interval, based upon corre- 

sponding pixels in corresponding blocks in said supplied picture information occurring immediately before and 
after said intermediate image time interval, by distributing to constructed image pixels a fractional portion of 
image intensity difference information between said corresponding pixels occurring before and after said inter- 
mediate time to produce averaged intensity pixels for said intermediate image time interval; 

20 associating with each constructed image pixel in each said intermediate time interval a corresponding true 

motion vector equal in magnitude to a fractional part of the true motion vector information associated with the 
block in which the corresponding pixel is located in a reference supplied time interval, said fractional part being 
determined according to the number of intermediate image time intervals inserted between supplied time inter- 
vals; and 

25 associating each said constructed averaged intensity pixel with a spatial location in said intermediate image 

time interval according to said fractional part of said corresponding decoded true motion vector. 

2. The method of claim 1 wherein: 

30 said neighboring block proximity weighting factors include, with respect to each block of image information, 

motion vector data associated with image blocks immediately adjacent to and further remote from said each 
image block, said true motion vector data being determined by summing motion vectors associated with said 
immediately adjacent and remote blocks, where said immediately adjacent parameters are weighed greater 
than said remote parameters in said summing. 

35 

3. The method of claim 2 wherein: 

said absent intermediate image time intervals correspond to omitted pairs of even and odd fields of image infor- 
mation, each said pair comprising a frame of image information, and 
40 said steps of constructing and associating true motion vectors with said image pixels in absent even and odd 

fields comprises comparing odd field information only with odd field information and comparing even field infor- 
mation only with even field information to determine said portion of image intensity difference and part of time 
motion vector information to be associated with image pixels in said absent fields. 

45 4. The method of claim 3 wherein: 

said supplied image time intervals correspond to 1/Nth of successive time intervals in an electronic image 
sequence, where N is an integer greater than one, and 

said fractional portion of image intensity difference and said fractional part of true motion information is 1/Nth 
so said difference and said information, respectively. 

5. In an electronic digital image sequence reproduction system, wherein digitally encoded picture information in frame 
format is supplied, including true motion vector data associated with individual blocks of image information, which 
motion vector data is dependent upon proximity weighted displacement parameters of neighboring blocks, a 
55 method of frame rate up-conversion comprising: 

decoding from said digitally encoded picture information said true motion vector data for each said block of 
image information in each transmitted frame, 
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interpolating image sequence signal data corresponding to intermediate frames occurring in time sequence 
between transmitted frames spaced apart by a time interval T, said intermediate frames occurring at time inter- 
vals J/N, where N is a whole number greater than one, by 

constructing pixels for each said intermediate frame by averaging, with respect to each block in an intermediate 
5 frame, intensity information for corresponding pixels in each block in the transmitted frames adjacent in time to 

said intermediate frame, 

associating with each averaged intensity pixel in each said intermediate frame a corresponding true motion 
vector equal in magnitude to 1/Nth the magnitude of the true motion vector information associated with the cor- 
responding block in which said pixel is located in an immediately succeeding transmitted frame, and 
w associating each said averaged intensity pixel with a spatial location in said intermediate frame according to 

said corresponding decoded true motion vector. 

6. The method of frame rate up-conversion according to claim 5 wherein: 

15 said neighboring block proximity weighting factors include, with respect to each block of image information, 

motion vector data associated with image blocks immediately adjacent to and further remote from said each 
image block, said true motion vector data being determined by summing motion vectors associated with said 
immediately adjacent and remote blocks, where said immediately adjacent parameters are weighed greater 
than said remote parameters in said summing. 

20 

7. The method of frame rate up-conversion according to claim 6 wherein: 

said absent intermediate image time intervals correspond to omitted pairs of even and odd fields of image infor- 
mation, each said pair comprising a frame of image information, and 
25 said steps of constructing and associating true motion vectors with said image pixels in absent even and odd 

fields comprises comparing odd field information only with odd field information and comparing even field infor- 
mation only with even field information to determine said portion of image intensity difference and part of time 
motion vector information to be associated with image pixels in said absent fields. 

30 8. The method of frame rate up-conversion according to claim 7 wherein: 

said supplied image time intervals correspond to 1/Nth of successive time intervals in an electronic image 
sequence, where N is an integer greater than one, and 

said fractional portion of image intensity difference and said fractional part of true motion information is 1/Nth 
35 said difference and said information, respectively. 

9. The method of claim 1 wherein: 

said absent intermediate image time intervals correspond to omitted pairs of odd numbered lines in an even 
40 numbered field of image information. 

said steps of constructing and associating true motion vectors with said image pixels in absent odd lines com- 
prises comparing odd line information only with odd line information to determine said portion of image inten- 
sity difference and part of time motion vector information to be associated with image pixels in said absent 
lines. 

45 

10. The method of claim 9 wherein: 

said neighboring block proximity weighting factors include, with respect to each block of image information, 
motion vector data associated with image blocks immediately adjacent to and further remote from said each 
so image block, said true motion vector data being determined by summing motion vectors associated with said 

immediately adjacent and remote blocks, where said immediately adjacent parameters are weighed greater 
than said remote parameters in said summing. 

11. The method of claim 1 wherein: 

55 

said absent intermediate image time intervals correspond to omitted pairs of even numbered lines in odd num- 
bered field of image information, and 

said steps of constructing-and associating true motion vector with said image pixels in absent even lines com- 



14 



EP 1 006 732 A2 



prises comparing even line information only with even line information to determine said portion of image inten- 
sity difference and part of time motion vector information to be associated with image pixels in said absent 
lines. 

5 12. The method of claim 11 wherein: 

said neighboring block proximity weighting factors include, with respect to each block of image information, 
motion vector data associated with image blocks immediately adjacent to and further remote from said each 
image block, said true motion vector data being determined by summing motion vectors associated with said 
10 immediately adjacent and remote blocks, where said immediately adjacent parameters are weighed greater 

than said remote parameters in said summing. 

13. The method of claim 12 and further comprising: 

75 comparing said constructed image pixels for said absent lines with corresponding image pixels in said field of 

image information to determine a spatial position for said constructed image pixels. 

14. In a digital image sequence reproduction system wherein image fields are made up of blocks of pixels, at least 
some of which pixels are spatially displaced in one field compared to spatial positions of corresponding pixels in a 

20 successive field, a method of determining true motion vector data associated with feature blocks comprising: 

selecting, from among all of the blocks of pixels in a field, candidate feature blocks having intensity variance 
between pixels in said blocks above a threshold which indicates the presence of prominent texture features, 
comparing candidate feature blocks with blocks in similarly numbered adjacent field intervals to determine sets 
25 of displaced frame difference (DFD) residue parameters for each said candidate feature block, 

comparing said sets of DFD residue parameters of said candidate blocks against a lower residue threshold and 
an upper residue limit, 

identifying unconditionally acceptable motion vectors for said candidate feature blocks as those motion vectors 
corresponding to instances where said DFD parameters are less than said lower threshold, 
30 determining rejected motion vectors for said candidate feature blocks as those corresponding to instances 

where said DFD parameters are greater than said upper limit, and 

determining conditionally acceptable motion vectors for said candidate feature blocks as those corresponding 

to instances where said DFD parameters are between said threshold and said upper limit, 

determining a global motion trend of a neighborhood around each said candidate feature block for which 

35 acceptable or conditionally acceptable motion vectors have been determined by applying each of said accept- 

able and conditionally acceptable motion vectors to the corresponding candidate feature block and its neigh- 
boring blocks in a predetermined spatial neighborhood of each of said candidate feature blocks, 
calculating, for each of said acceptable and conditionally acceptable motion vectors, a weighted score of a sum 
of residues for said corresponding candidate feature block and its neighboring blocks in said neighborhood of 

40 said candidate feature block, and 

selecting, as the true motion vector for each candidate feature block, the motion vector corresponding to a min- 
imum weighted score. 

45 
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A method of determining true motion vectors. 
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